Data sanitization in association rule mining based on impact factor

Authors

  • A. Shahbahrami Department of Computer Engineering, University of Guilan, Rasht, Iran.
  • A. Telikani Department of Electronic & Computer Engineering, Institute for Higher Education Pouyandegan Danesh, Chalous, Iran.
  • R. Tavoli Department of Mathematics, Chalous Branch, Islamic Azad University, Chalous, Iran.
Abstract:

Data sanitization is a process that is used to promote the sharing of transactional databases among organizations and businesses, it alleviates concerns for individuals and organizations regarding the disclosure of sensitive patterns. It transforms the source database into a released database so that counterparts cannot discover the sensitive patterns and so data confidentiality is preserved against association rule mining method. This process strongly rely on the minimizing the impact of data sanitization on the data utility by minimizing the number of lost patterns in the form of non-sensitive patterns which are not mined from sanitized database. This study proposes a data sanitization algorithm to hide sensitive patterns in the form of frequent itemsets from the database while controls the impact of sanitization on the data utility using estimation of impact factor of each modification on non-sensitive itemsets. The proposed algorithm has been compared with Sliding Window size Algorithm (SWA) and Max-Min1 in term of execution time, data utility and data accuracy. The data accuracy is defined as the ratio of deleted items to the total support values of sensitive itemsets in the source dataset. Experimental results demonstrate that proposed algorithm outperforms SWA and Max-Min1 in terms of maximizing the data utility and data accuracy and it provides better execution time over SWA and Max-Min1 in high scalability for sensitive itemsets and transactions.

Upgrade to premium to download articles

Sign up to access the full text

Already have an account?login

similar resources

data sanitization in association rule mining based on impact factor

data sanitization is a process that is used to promote the sharing of transactional databases among organizations and businesses, it alleviates concerns for individuals and organizations regarding the disclosure of sensitive patterns. it transforms the source database into a released database so that counterparts cannot discover the sensitive patterns and so data confidentiality is preserved ag...

full text

Generalized Association Rule Mining Algorithms Based on Multidimensional Data

This paper proposes a new formalized definition of generalized association rule based on Multidimensional data. The algorithms named BorderLHSs and GenerateLHSs-Rule are designed for generating generalized association rule from multi-level frequent item sets based on Multidimensional Data. Experiment shows that the algorithms proposed in this paper are more efficiency, generate less redundant r...

full text

Association Rule Mining on Distributed Data

Applications requiring large data processing, have two major problems, one a huge storage and its management and second processing time, as the amount of data increases. Distributed databases solve the first problem to a great extent but second problem increases. Since, current era is of networking and communication and people are interested in keeping large data on networks, therefore, researc...

full text

Privacy Preserving Association Rule Mining based on the Intersection Lattice and Impact Factor of Items

Association Rules revealed by association rule mining may contain some sensitive rules, which may cause prospective threats towards privacy and protection. A number of researchers in this area have recently made efforts to preserve privacy for sensitive association rules in transactional databases. In this paper, we put forward a heuristic based association rule hiding algorithm to get rid of t...

full text

Association Rule Mining Based On Trade List

In this paper a new mining algorithm is defined based on frequent item set. Apriori Algorithm scans the database every time when it finds the frequent item set so it is very time consuming and at each step it generates candidate item set. So for large databases it takes lots of space to store candidate item set .In undirected item set graph, it is improvement on apriori but it takes time and sp...

full text

My Resources

Save resource for easier access later

Save to my library Already added to my library

{@ msg_add @}


Journal title

volume 3  issue 2

pages  131- 140

publication date 2015-10-01

By following a journal you will be notified via email when a new issue of this journal is published.

Hosted on Doprax cloud platform doprax.com

copyright © 2015-2023